Surprises in approximating Levenshtein distances
نویسندگان
چکیده
منابع مشابه
Approximating Nearest Neighbor Distances
Several researchers proposed using non-Euclidean metrics on point sets in Euclidean space for clustering noisy data. Almost always, a distance function is desired that recognizes the closeness of the points in the same cluster, even if the Euclidean cluster diameter is large. Therefore, it is preferred to assign smaller costs to the paths that stay close to the input points. In this paper, we c...
متن کاملObliviously Approximating Sequence Distances
There are several applications for schemes which approximately nd the distance between two sequences in a way that isòblivious' of one of the sequences up until a nal sublinear number of comparisons. This paper shows how sequences can be preprocessed obliviously to give a binary string, so that a simple vector distance between two bitstrings gives an approximation to a sequence distance of inte...
متن کاملLevenshtein Distances Fail to Identify Language Relationships Accurately
The Levenshtein distance is a simple distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention as a means of automatically classifying languages into genealogical subgroups. In this article I test the performance of the Levenshtein distance for classifying languages by subsampling three language subsets from...
متن کاملApproximating Subtree Distances Between Phylogenies
We give a 5-approximation algorithm to the rooted Subtree-Prune-and-Regraft (rSPR) distance between two phylogenies, which was recently shown to be NP-complete. This paper presents the first approximation result for this important tree distance. The algorithm follows a standard format for tree distances. The novel ideas are in the analysis. In the analysis, the cost of the algorithm uses a "cas...
متن کاملGenerating a bilingual lexical corpus using interlanguage normalized Levenshtein distances
Finding large numbers of target items for phonetic and phonological experiments can be a time-consuming and error-prone task. Using freely available tools and data, we have generated a bilingual corpus with the specific aim of investigating the processing and perception of stress in second-language (L2) words. Normalized Levenshtein distances between orthographic and phonemic transcriptions of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Theoretical Biology
سال: 2006
ISSN: 0022-5193
DOI: 10.1016/j.jtbi.2006.06.026